Topic-Specific Scoring of Documents for Relevant Retrieval

نویسندگان

  • Wray Buntine
  • Jaakko Löfström
  • Sami Perttu
  • Kimmo Valtonen
چکیده

There has been mixed success in applying semantic component analysis (LSA, PLSA, discrete PCA, etc.) to information retrieval. Here we combine topic-specific link analysis with discrete PCA (a semantic component method) to develop a topic relevancy score for information retrieval that is used in post-filtering documents retrieved via regular Tf.Idf methods. When combined with a novel and intuitive “topic by example” interface, this allows a user-friendly manner to include topic relevance into search. To evaluate the resultant topic and link based scoring, a demonstration has been built using the Wikipedia, the public domain encyclopedia on the web.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic-Specific Link Analysis using Independent Components for Information Retrieval

There has been mixed success in applying semantic component analysis (LSA, PLSA, discrete PCA, etc.) to information retrieval. Previous experiments have shown that high-fidelity language models do not imply good quality retrieval. Here we combine link analysis with discrete PCA (a semantic component method) to develop an auxiliary score for information retrieval that is used in post-filtering d...

متن کامل

The Relative generality and precision of Evidence Based Medical Infor-mation Resources in the Recovery of Diabetes Information

Background and Aim: Relative generality and precision are two important criteria for measuring the efficiency and performance of information retrieval systems. The aim of this study was to compare the integrity and location of evidence-based bases in the digital library of Hamedan University of Medical Sciences in data retrieval of diabetes.    Methods: The design of this research is cross-sect...

متن کامل

Effects of Language and Topic Size in Patent IR: An Empirical Study

We revisit the effects that various characteristics of the topic documents have on the effectiveness of the systems for the task of finding prior art in the patent domain. In doing so, we provide the reader interested in approaching the domain a guide of the issues that need to be addressed in this context. For the current study, we select two patent based test collections with a common documen...

متن کامل

High-Recall Document Retrieval from Large-Scale Noisy Documents via Visual Analytics based on Targeted Topic Modeling

We present a visual analytics system for large-scale document retrieval tasks with high recall where any missing relevant documents can be critical. Our system utilizes a novel user-driven topic modeling called targeted topic modeling, a variant of nonnegative matrix factorization (NMF). Our system visualizes a topic summary in a treemap form and lets users keep relevant topics and incrementall...

متن کامل

Topic Analysis for Psychiatric Document Retrieval

Psychiatric document retrieval attempts to help people to efficiently and effectively locate the consultation documents relevant to their depressive problems. Individuals can understand how to alleviate their symptoms according to recommendations in the relevant documents. This work proposes the use of high-level topic information extracted from consultation documents to improve the precision o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005